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Abstract 

Linear mixture models are commonly used to represent hyperspectral datacube as a linear 
combinations of endmember spectra. However, determining of the number of endmembers for images 
embedded in noise is a crucial task. This paper proposes a fully automatic approach for estimating the 
number of endmembers in hyperspectral images. The estimation is based on recent results of random 
matrix theory related to the so-called spiked population model. More precisely, we study the gap 
between successive eigenvalues of the sample covariance matrix constructed from high dimensional 
noisy samples. The resulting estimation strategy is unsupervised and robust to correlated noise. This 
strategy is validated on both synthetic and real images. The experimental results are very promising 
and show the accuracy of this algorithm with respect to state-of-the-art algorithms. 
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I. Introduction 


Unmixing techniques can provide fundamental information when analyzing multispectral or 
hyperspectral images with limited spatial resolution. In spite of almost 50 years of research 
in this area, there has been a surge of interest in the last few years within the area of 
remote sensing and hyperspectral imaging O' 0- Even with an ever-increasing spatial 
resolution, each pixel (or spectrum) in a hyperspectral image is generally associated with 
several pure materials. Each spectrum can thus be seen as a mixture of spectral signatures 
called endmembers with respective proportions called abundances. While non-linear unmixing 
techniques have been recently investigated 0-[[^, the linear mixing model is widely accepted 
because of its natural physical interpretation. This model assumes that each spectrum is a 
convex combination of the endmember spectra. Unmixing hyperspectral images consists of 
three stages: (i) determining the number of endmembers and possibly projecting the data onto 
a subspace of reduced dimension 0, [|7|, (ii) extracting endmember spectra [[^, 0 and (iii) 
estimating their abundances p0|-p^. These stages can be performed separately or jointly 
i[Tg-[[Tg. Determining the number of endmembers, or the signal subspace dimension, then 
appears as a fundamental step in order to achieve endmember spectrum determination and 
abundance estimation. This paper considers this problem of estimating the signal subspace 
dimension of hyperspectral images. 

Estimating the number of endmembers present in a scene has been described under several 
names and under different methodological frameworks. The most well known definitions are 
based on the eigenvalues of the sample (observation) covariance matrix, with the so-called 
“virtual dimension” (VD), as well as many variants including the “intrinsic dimension” and 
the “effective dimension”. The VD is estimated by the so-called Harsanyi-Earrand-Chang 
(HEC) method which relies on the Neyman-Pearson detection theory applied to the difference 
between the eigenvalues of the sample covariance matrix and its non centered counterpart 
(i.e., the matrix of second-order moments) Q. The HEC, and its noise whitened version 
(NWHEC), are generally more efficient than algorithms based on model selection criteria 
such as the Akaike information criterion (AIC) and the minimum description length 
(MDE) (Tg, [[g, especially in the presence of colored noise. The idea of evaluating the 
differences between the eigenvalues of the covariance and the correlation matrices has also 
been exploited in other algorithms such as [19|. In 0, the authors proposed an unsupervised 
approach for hyperspectral subspace identification called Hysime. Their method consists of 
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minimizing a cost function whose aim is to reduce the noise power. Other methods only use 
the sample eovarianee matrix without eonsidering the eorrelation matrix. In Q, the noise 
subspaee projection method eonsiders a Neyman-Pearson test to separate noise eomponents 
from signal eomponents based on a whitened eovarianee matrix. The idea is that the noise 
eigenvalues are equal to unity while the signal eigenvalues are greater than one. 

Random matrix theory (RMT) is a universal multivariate statisties tool that has been used 
successfully in many fields. Recently, an approaeh based on RMT has been applied to estimate 


the number of endmembers [20|. This method (denoted as RMT) first estimates the noise 


eovarianee matrix in order to remove eolored noise effeets. Then, based on the whitened 
eovarianee eigenvalues, this method proposes a theoretieal threshold to determine the number 


of endmembers in the image. However, this method appears to be sensitive to noise [20|, 


1211 whieh might reduee the estimation performanee. Moreover, it has been shown in [221 
that many noise estimation algorithms are sensitive to noise correlation. In the presenee of 


sueh eorrelations, the RMT algorithm [20| may provide results of poor quality. 

The motivation of our work is to provide a eonsistent and unsupervised estimator of 
the number of endmembers by considering a general scenario, where the additive noise 
components are not identieally distributed. The main advantage of the proposed approach, 
with respect to (w.r.t.) the RMT algorithm, is its robustness in the presenee of correlated 
noise. Similarly to the RMT and Hysime approaches, our method starts by estimating the 
noise covariance matrix in order to remove its effeet from the sample/observation covariance 
matrix. The next step is inspired from reeent results on spiked population models (SPM). 


Indeed, [231 proposed a method based on the gap between suceessive eigenvalues of the 
sample eovarianee matrix. By considering sorted eigenvalues, the main idea is that the gap 
between eigenvalues (of a whitened eovarianee) is larger in the presenee of a signal while it 
is reduced for noise components. Building on this idea, an automatic threshold is obtained 
to separate the signal from noise components. 


Contributions and comparisons 

The main objeetive of the paper is to provide an unsupervised algorithm for estimating 
the number of endmembers in hyperspectral images. The proposed approaeh generalizes the 


eonsistent estimator proposed in [23[ for independent and identieally distributed (i.i.d.) noise 
to the eolored Gaussian noise case. This eigen-gap approaeh is based on a eonsistent estimator 


while the RMT algorithm [20| is not fully eonsistent as stated in [23 j. The proposed approaeh 
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appears to be more robust to eorrelated noise and to small image sizes. These statements are 
validated on both synthetie and real hyperspeetral images. 

The paper is organized as follows. Seetion |I^ introduees the hyperspeetral mixing model, 
the SPM and rank estimation methods for an SPM. Section [^introduces our algorithm whose 
performance is evaluated in Section |IV| on synthetic images. Results on real hyperspeetral 
images are presented in Section jVj Conclusions and perspectives for future works are finally 


reported in Section VI 


II. Problem formulation 

A. Linear mixture model 

The linear mixture model (LMM) assumes that each pixel spectrum y^, of size L x 1, is 
a linear combination of R endmembers rrir, r G {1, • • ■ , R}, corrupted by an additive noise 
Bn as follows 

R 

l/n ^ ^ (Xrn'^^r T 

r=l 

= Man + Bn ( 1 ) 

with Bn ~ 7\^(0l,S) a Gaussian noise, S is the noise covariance matrix, Oi, is an L x 1 
vector of 0, an = [ai„, • • ■ is the Rx 1 abundance vector of the nth pixel and 

M = [nil, • ■ ■ 5 is an L X i? matrix gathering the endmember spectra. The abundance 
vector an contains proportions satisfying the positivity and sum-to-one (PSTO) constraints 
o-rn > 0, Vr G {1,..., i?} and ^rn = 1- Considering N pixels gathered in the L x iV 

matrix Y, the LMM can be written as follows 

Y = MA + E (2) 

where A is an i? x iV matrix of abundances, and E an L x N matrix of noise samples. 

Rank estimation can be based on an eigen-value analysis of the covariance matrix of Y. 
Assuming independence between the signal counterpart S = MA and the noise E leads to 

Ry = Rs + S (3) 

where Ry and Rs are the covariance matrices of Y and S, respectively. In this paper, we 
are interested in estimating the number R of endmembers, which is equal to iC + 1, where 
K = rank(i? 5 ). Indeed, the signal lies into a subspace of dimension R — 1 because of the 
PSTO constraints. 
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B. Spiked population model 

A well-known model in RMT is the spiked population model. This model assumes that 
the eovarianee matrix of interest has all its eigenvalues equal to cx^ except a few eigenvalues 


(known as spikes) as follows [23| 


A = cr^r 


7 i 



1 

o 

IK 


0l-k,k 

Il-k 


(4) 


where A is the covariance matrix, F is an L x L orthogonal matrix, Otj is the i x j matrix 
of 0 and is the L x L identity matrix. Determining the number of endmembers can be 
performed by computing the number of spiked eigenvalues of the covariance matrix A. For 
this purpose, consider that Ry = A and denote its eigenvalues by for A; = 1, ■ ■ ■ , L. By 
assuming S = a'^I l and the eigenvalue vector [pi, • • ■ , @ leads to 

<7^ = 7fccr^, fox k < K 


Pk 


(5) 


At — 


Pk + (J , 




( 6 ) 


and @ yields 

“ if A: < a: 
otherwise. 

Unfortunately, in many situations, the covariance matrix Rs is unknown and the additive 
noise is not necessarily identically distributed contradicting the assumption S = The 

alternative proposed in this work builds an estimator of the number of endmembers, when 
only the sample covariance matrix Ry is known and e„ is an additive independently and not 
identically distributed zero-mean Gaussian noise sequence. 


C. Rank estimation from an SPM 

Estimating the number of spikes from an SPM is an interesting problem that has found 


many applications including chemical mixtures |24| and hyperspectral unmixing [20|. A 
recent work proposed to investigate RMT to estimate the number of spikes or endmembers 


in hyperspectral images [20|. This work builds on the estimator proposed in [24| in the 
context of chemical mixtures. This method uses the following assumptions: (i) N ^ oo and 
L —)■ cx) (or large values of N and L) with c = ^ > 0 a positive constant , (ii) the noise 


1 


In presence of colored noise, an adequate procedure will be considered as shown in the following. 


January 23, 2015 


DRAFT 











6 


corrupting the data is Gaussian and independent of the signal, (iii) the signal eovarianee 


matrix has a fixed rank K. Under these assumptions, the method [20|, [24| is based on the 
study of the asymptotie behavior of the largest eigenvalues of the sample eovarianee matrix 
when both the dimension of the observations and the sample size grow to infinity at the same 
rate. The main idea is that when the eovarianee matrix A is a perturbed version of a finite 
rank matrix, all but a finite number of eigenvalues of the eovarianee matrix are different 


from the i.i.d. noise varianee. Based on this property and on [25|, [26|, a threshold that 


separates the eigenvalues eorresponding to the useful information from those eorresponding 
to the noise was derived in p0|, p4| yielding 


(A 


^ = kA-,L 


(7) 


where Ai > A 2 > • • ■ > are the eigenvalues of the sample eovarianee matrix A, s(a) ean 
be found by using the Traey-Widom distribution, and 

/3c = (1 + -^/c) + Vc~^j . (8) 

This estimator is based on a sequenee of nested hypothesis tests. By eonstruetion, the proposed 


estimator is not fully eonsistent as shown in [ 271. In this paper, we are interested in deriving 


a new estimator with better statistieal properties. 

One of the front-line researeh problems in RMT is the study of the gap between eonseeutive 
eigenvalues [|2^, [|2^, [[27|. Indeed, the eigenvalue differenees ean be used for the estimation 


of the number of spikes under the following assumptions [23|: (i) N and L are related by the 


asymptotie regime N ^ 00 , ^ c > 0, (ii) the noise eorrupting the data is Gaussian and 


independent of the signal (to satisfy assumption 3.1 in [231), (iii) the signal eovarianee matrix 


has a fixed rank K, (iv) the eigenvalues of the sample eovarianee matrix are of multiplieity 
one0 and (v) 71 > • • • > 7 /^' > 1 + s/c. Note first that using hypotheses (i) and (v), it is 
shown in that the eigenvalues of the eovarianee matriees of spiked population models 
satisfy almost surely 


Ai 


N^oo 




for each G {1, • • • , K}, while, for fc > 


At 


N^oo 


» CT^(l + 


(9) 


( 10 ) 


^The general case of multiple multiplicity has been considered in 
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where 0 (a;) is defined by 


(j){x) = (x + 1 ) ^ j . 


( 11 ) 


These results were used in [231, [281 to infer the number of eomponents K in the ease where 
= 1. In the general ease where cr^ 7 ^ 1, the authors of [231 stated that one should divide the 


eigenvalues by the noise varianee to apply the results obtained for = 1. The estimation 
method in [|23| eonsiders the following differenees between sueeessive eigenvalues 


for h 1 , * * * , T 1 . 


( 12 ) 


The main idea is that, when approaehing non-spiked values, the eigen-gap 6k shrinks to small 
values. Therefore, the number of endmembers ean be estimated as follows 


K = mm{k G {1,... < (In} 


(13) 


where M > K is a fixed integer (large enough), and d 


N 


-)■ 0 is a threshold to determine. 


Aeeording to [23 j, the eonsisteney of this estimator is ensured if dN ^0 and ^ 

+ 00 . The same authors proposed to use dN = with i/jn = 4 a/ 2 log(logiV) that satisfies 

the former eonditions. The obtained algorithm was fully unsupervised in the sense that it did 
not require to tune any parameter. The main differenee between this strategy and [|20|, [|24| 


is that [ |23| builds a test statisties based on the gaps between sueeessive eigenvalues and 
not on the eigenvalues themselves. An important eonsequenee is that a theoretieal estimator 


eonsisteney is ensured in the ease of the gap approaeh while the method deseribed in [ 20 |, 


depends on a parameter a and is nearly eonsistent as stated in [23 [. 


III. Proposed algorithm 

The eigen-gap strategy assumes the noise to be i.i.d. whieh is not true when eonsidering 
hyperspeetral images Q, [ |29| . Therefore, we propose to use a preliminary step before 
estimating the number of endmembers. 


A. Noise estimation 

A great effort has been devoted to the noise estimation problem sinee it is essential for 
many signal proeessing applieations requiring whitening and/or dimension reduetion. Among 
these algorithms, we distinguish those assuming spatial homogeneous regions sueh as the 
nearest neighbor differenee (NND) [30|, the geometrieal based algorithm pTf , and algorithms 
estimating the noise sueh as the multiple regression based methods 0,0, [|32|. The NND 
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algorithm requires homogenous areas that are not always available in hyperspeetral images 
0 . The Meer algorithm does not aeeount for noise speetral eorrelation sinee it estimates the 


noise varianee for eaeh band separately [331. This paper eonsiders the multiple regression 
based method proposed in Q since it has been studied in many subspace identification 
algorithms [ [^ , and has shown similar results as the residual method of Q as stated in 
[ |33| . However, the proposed approach is still valid when considering other noise estimation 
algorithms. 

The multiple regression method |j^ assumes that the £th spectral band of each pixel vector 
is connected to the L — 1 other bands by a linear model. More precisely, denoting as the 

X 1 vector containing the pixel elements of the fth band, and Y_£ the (L — 1) x iV matrix 
obtained by removing the fth row from the matrix Y, we assume that 


ye = Y+ e 


(14) 


where is the modeling error vector of size iV x 1 and bi is the (L — 1) x 1 regression 
vector that is estimated using the least squares estimator [[^ 

be={Y.,Yl,)-W_eye- (15) 

The noise vector is then estimated by Si = y^ — Y~[f^b£ and its covariance matrix is given by 


E = (ei, ■ ■ ■ , e^)"^ (ei, ■ ■ ■ , ez,) /N. 


(16) 


Once the noise covariance matrix has been estimated, a whitening procedure can be performed 
as described in the next section. 


B. Rank estimation 

Before applying the eigen-gap test, let us first remove the effect of colored noise. This can 
be achieved by whitening the observed pixels Y using the estimated noise covariance matrix 


S. However, it has been shown in [20|, [211 that this procedure leads to an overestimated 
subspace dimension K when combined to RMT approaches. Therefore, we will consider the 


strategy used in [20[. Under the assumption that v- lUj 7 ^ 0, Vi = 1, • • • , L, it has been shown 


in [201 that 


Afc — 


+ if k<K 


Wk 


vJ'Swk 


otherwise 


(17) 
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where Vk and denote the eigenveetors of Ry and Rs, respeetively. Note that ( [TT] ) is 
similar to (|^ exeept that the noise varianee has been estimated differently as 


-2 _ vlllWk 

(J U — =F • 

vlwk 


(18) 


Equations ( [T7| ) and ( [T^ require the eomputation of the eigenveetors of Rs- The eovarianee 
matrix Rs is unknown but ean be estimated using Q as follows 


Rs = Ry — S. 


(19) 


Finally, to aeeount for eolored noise, one has to inelude the noise varianee ( [T8] ) in Q and 
( fTO) ) by dividing eaeh eigenvalue by the eorresponding noise varianee as stated in 
1. The resulting rank estimator is given by 


with 


K = min {k e {1,..., M}; A^+i < (In} 


A -^A: Afc-|_i 

~ ~ iV2/3 




( 20 ) 


( 21 ) 


'k '^k+l 

The resulting algorithm is summarized in Algo. 


Algorithm 1 Proposed algorithm 


1 : Compute the sample eovarianee matrix Ry 
2 : Estimate the noise eovarianee matrix S 

3: Compute the matrix V eontaining the eigenveetors of Ry (sorted in deseending order 
of the eigenvalues) 

4 : Compute the matrix W eontaining the eigenveetors of Rs = Ry — S (sorted in 
deseending order of the eigenvalues) 

5: Compute Afc, A: G {1, • • ■ , L} the eigenvalues of Ry (sorted in deseending order) 

6 : Compute a1 aeeording to ( [T^ 

7 : Compute Afc+i and aeeording to pT]) 


3 : Estimate the number of endmembers R = K + lhy evaluating 


IV. Simulation results on synthetic data 

This seetion analyzes the performanee of the proposed eigen-gap approaeh (EGA) with 
simulated data. The proposed approaeh is eompared to the NWHFC approaeh sinee it has been 
shown in Q to provide better results than the approaehes based on information eriteria sueh 
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as AIC [161 and MDL [17|, [18|. Note that the NWHFC algorithmic requires the definition 
of the false alarm probability Pf. We have eonsidered in our experiments three values Pj G 
{10-3,10-^ 10-3} denoted by NWHFCi, NWHFCs and NWHFC 3 , respeetively. The EGA 
is also eompared to the RMT approaeh proposed in [ [20| sinee it uses similar theoretieal 
tools. The well known Hysime algorithm [|^ is also investigated sinee it has been used 


in many studies [ |20| , [ |34| |. The eonsidered datasets were eonstrueted based on the USGS 
speetra library used in [j^. As in [20|, we eonsidered 20 minerals that vary widely (some 
speetra are similar, other are different, some speetra have low amplitude,...) as shown in Fig. 
The abundanees were drawn uniformly in the simplex defined by the PSTO eonstraints 



Fig. 1. Spectra from USGS library. 


using a Diriehlet distribution [[^. The following seetions present three kinds of results: (i) 
robustness with respeet to noise, (ii) impaet of the image size and (hi) performanee with 
respeet to the number of endmembers. In all these experiments, we eonsidered the following 

^The NWHFC is obtained by preceding the HFC algorithm by a noise whitening step. We have considered the HFC 
algorithm available in: http://www.ehu.es/computationalintelligence/index.php/Endmember_Induction_Algorithms 
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parameters N = 10^ pixels, L = 224 bands, SNR = 25 dB, and i? = 4 endmembers, when 
fixed aeeording to the experimental setups (i), (ii) or (iii). We performed 50 Monte-Carlo 
simulations for eaeh experiment. 


A. Robustness to noise 

This seetion studies the robustness of the proposed approaeh with respeet to noise. Two 
experiments were eonsidered. The first experiment studies the performanee of the different 
algorithms when the noise varianee is inaeeurately estimated. Indeed, all the algorithms 
aeeount for a noise estimation step that may introduee some errors. Therefore, we simulated 
synthetie images using i? = 4 fixed endmembers (ehosen from the 20 speetra) with an i.i.d. 
Gaussian noise with varianee (eorresponding to SNR = 25 dB). Then, we applied the 
deseribed algorithms when eonsidering a noise varianee given by (1 + e), to simulate an 
error in the noise estimation step. Fig. shows the obtained aeeuraey (in pereent) of the 
estimated number of endmembers when varying e (the aeeuraey represents the pereentage 
of good estimates). This figure shows the robustness of the algorithms with respeet to noise 
overestimation. However, observe that both RMT and Hysime algorithms are sensitive to noise 
varianee under-estimation sinee they provide uneorreet results for e < —0.1 and e < —0.4, 
respeetively. The results show the robustness of the proposed EGA sinee it provides an 
aeeuraey higher than 90% for e > —0.5. The best performanee was obtained with the NWHFC 
approaeh. This algorithm applies a Neyman-Pearson test on the differenee between eovarianee 
and eorrelation eigenvalues. Therefore, the additive noise perturbation introdueed by (1 -f e) 
is eliminated (or greatly redueed). The proposed EGA is more robust than RMT and Hysime 
to noise estimation errors whieh is of great interest espeeially when eonsidering real data. 

The seeond experiment eonsiders effeet of the noise eorrelation between the different 


speetral bands denoted as speetral eorrelation, that is generally observed in real data [22|, 
1. To simulate data with speetral eorrelation, we eonsidered the following eovarianee 


January 23, 2015 


DRAFT 



12 



Fig. 2. Robustness of the algorithms with respect to the accuracy of the noise estimation. 


Structure]^ when band j is correlated with band j + 1 with a correlation coefficient C 


S = 


err 


err 






er: 


i+i 


0 0 


err 


( 22 ) 


This covariance structure was chosen to compare our results with [331, which used a similar 


matrix structure. We first varied the number of correlated spectral bands when considering 
a correlation coefficient C = 0.5. The correlated bands are chosen randomly from the 
set {1, • • • ,L — 1}. For all the algorithms, we considered the noise estimation algorithm 

Fig. (top) shows a linear evolution of R w.r.t. the number of 


described in Section 


III-A 


represented the covariance structure for one correlated band j. The case of multiple correlated bands can be obtained 
by considering multiple values for j. 
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correlated bands for both RMT and Hysime (whieh is in agreement with the results of [331). 
Both EGA and NWHFC show a stable result as the number of eorrelated bands inereases. 
Note that EGA presents the best results. In a seeond study, we varied C when eonsidering 
10 eorrelated bands (drawn randomly between 1 and L). The results are shown in Eig. 
(bottom). The EGA shows the best performanee exeept for C > 0.8 where NWHEC has 
a more stable results. To summarize, the obtained results illustrate the robustness of the 
EGA with respeet to noise estimation error and noise eorrelation. It is more robust to noise 
eorrelation than RMT, Hysime and NWHEC. Both EGA and NWHEC are robust to noise 
estimation error. 





-EGA 

.O" 

■RMT 

.+■■■■ 

■ HySime 


-NWHFC1 

.■<:>■■■ 

■ NWHFC2 


NWHFC3 


Fig. 3. Estimated R with respect to (top) number of correlated bands, (bottom) variation of the correlation coefficient. 
The actual number of endmembers is i? = 4. 


B. Robustness to the image size 


As deseribed in Seetion II-C the EGA is valid when 71 > ■ ■ ■ > 7 ^^- > 1 + ^/c, with 
c = ^. While this eondition suggests that the image size should be large to obtain good 
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results, we will see in this seetion that aeeeptable results are also obtained for small images. 
The simulated images were obtained by using the previous R = A endmembers and an i.i.d. 
Gaussian noise with SNR = 25 dB. Table |i] shows the median of the estimated R over 50 
Monte-Carlo results, and the obtained aeeuraey indieated between braekets when varying the 
image size. All the algorithms provided poor results for N = 100. However, both EGA and 
NWHFC provided aeeurate estimates for N > 400 pixels. Hysime offered aeeurate estimates 
for N > 2500 pixels while RMT required the largest number of pixels N = 10000 pixels. 


Note that the obtained results are in agreement with those of Seetion IV-A Indeed, the 
estimated noise eovarianee S in ( fT^ is sensitive to the number of pixels, that is, a redueed 
number of pixels inereases the estimation error of S. Therefore, algorithms that are robust 
to noise estimates are expeeted to perform better when redueing the image size, whieh is 
observed in Table To eonelude, the results of this seetion show that EGA provides aeeurate 
results even for small images. 


TABLE I 

Estimated R with respect to the image size N. Estimated median value and the accuracy in percent 

BETWEEN BRACKETS. 


Method 

N = 10^ 

N = 20^ 

CO 

o 

to 

N = 50^ 

N = 10"^ 

EGA 

too (0) 

4 (86) 

4 (100) 

4 (100) 

4 (100) 

RMT 

too (0) 

63 (0) 

23 (0) 

8 (0) 

4 (100) 

HySime 

too (0) 

98 (0) 

29 (0) 

4 (100) 

4 (100) 

NWHFC 1 

67 (0) 

4 (100) 

4 (100) 

4 (100) 

4 (100) 

NWHFC2 

66 (0) 

4 (100) 

4 (100) 

4 (100) 

4 (100) 

NWHFCa 

66 (0) 

4 (100) 

4 (100) 

4 (100) 

4 (100) 


C. Performance 

This section studies the performance of the EGA when varying the number of endmembers, 
the noise level and the noise shape, as in Q, [34|. The synthetic images were generated 


using the standard parameters described in Section IV For each Monte-Carlo simulation, 
the endmembers were randomly chosen in a database containing 20 minerals. Moreover, and 
similarly to [|^, p4] |, we considered two noise shapes w.r.t. spectral bands: (i) a constant 
shape w.r.t. spectral bands which represents an i.i.d. Gaussian noise and (ii) a Gaussian shape 
for the noise variance w.r.t. spectral bands defined as follows 

'-(£-L/2)^' 


exp 


(Je = a 


( 2 »? 2 ) 


zZi exp 


-{i-LRP 

(2r?2) 


— 1,• • • , L 


(23) 
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where cr^ is fixed aeeording to the required SNR and r] eontrols the width of the Gaussian 
shape of the noise varianee. Table |I^ shows the obtained results with an i.i.d. Gaussian noise. 
This table shows that all the algorithms provide good estimates for all SNRs when eonsidering 
a redueed number of endmembers R < 5. However, the NWHFC algorithm shows poor results 
for large values of R even for high SNRs. Note that Hysime, RMT and EGA algorithms 
provide good estimates for high SNR (SNR> 25 dB) while the Hysime performanee deereases 


for low SNR. Note finally that RMT and EGA provide similar performanee. Table III shows 


the results when eonsidering a Gaussian shape for the noise varianee. This table shows poor 
results for NWHEC even for small values of R. However, the results are slightly improved 
when using the aetual noise eovarianee matrix instead of the estimated one (see results 
between braekets). The Hysime, RMT and EGA algorithms have a similar behavior as shown 
in Table |I^ i.e., the Hysime performanee deereases for low SNR while the RMT and EGA 
results are slightly better. These results show the aeeuraey of the EGA that provides equal 
or better results than the state-of-art algorithms. 


V. Simulation results on real data 

This seetion evaluates the EGA performanee for three real hyperspeetral images. The first 
image was aequired in 2010 by the Hyspex hyperspeetral seanner over Villelongue, Eranee (00 
03’W and 4257’N). The dataset eontains L = 160 speetral bands reeorded from the visible to 


near infrared (400 — 1000 nm) with a spatial resolution of 0.5 m |35|. The eonsidered subset 


eontains 702 x 1401 pixels and is mainly eomposed of forested areas |14|, [29| as shown 
in RGB eolors in Eig. |^(a). Aeeording to [351, the ground truth of this image eontains 12 
tree speeies that are: ash tree, oak tree, hazel tree, loeust tree, ehestnut tree, lime tree, maple 
tree, beeeh tree, bireh tree, willow tree, walnut tree and fern. Consequently, the number 
of endmembers is expeeted to be at least equal to 12. Table (first eolumn) shows the 
experimental results. The EGA estimated R = 12 endmembers, whieh is in agreement with 
the ground truth information. The RMT, HEC and NWHEC provided a larger estimate while 
Hysime underestimated the number of endmembers. Note that the results obtained with HEC 
and NWHEC were expeeted sinee they estimate, not only the endmember sourees, but also 
the interferenees 0.0. 

The seeond image was aequired by the airborne visible/infrared imaging speetrometer 
(AVIRIS) over the Cuprite mining site, Nevada, in 1997. This image eontains 182 speetral 
bands with a speetral resolution of 10 nm aequired in the 0.4-2.5 /im region (the water 
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TABLE II 

Median of the estimated R for data corrupted by white noise (50 Monte Carlo simulations). For 

NWHFC, WE SHOW BETWEEN BRACKETS THE RESULTS WHEN USING THE GROUND-TRUTH NOISE COVARIANCE 

MATRIX. 


SNR 

Method 

R=3 

II 

R = 10 

R = 15 


EGA 

3 

5 

7 

8 


RMT 

3 

5 

8 

8 

15 dB 

HySime 

3 

4 

5 

4 


NWHFC 1 

3 (3) 

4 (4) 

3 (3) 

3 (3) 


NWHFC2 

3 (3) 

4 (4) 

3 (3) 

3 (3) 


NWHFCs 

3 (3) 

4 (4) 

3 (3) 

3 (3) 


EGA 

3 

5 

10 

12 


RMT 

3 

5 

10 

12 

25 dB 

HySime 

3 

5 

8 

9 


NWHFC 1 

3 (3) 

4(4) 

5 (5) 

5 (5) 


NWHFC 2 

3 (3) 

4(4) 

5 (5) 

5 (5) 


NWHFCs 

3 (3) 

4(4) 

5 (5) 

4 (4) 


EGA 

3 

5 

10 

15 


RMT 

3 

5 

10 

15 

35 dB 

HySime 

3 

5 

10 

13 


NWHFC 1 

3 (3) 

4(4) 

7 (7) 

7 (7) 


NWHFC 2 

3 (3) 

4(4) 

7 (7) 

6(6) 


NWHFCs 

3 (3) 

4 (4) 

6 (6) 

6(6) 


EGA 

3 

5 

10 

15 


RMT 

3 

5 

10 

15 

50 dB 

HySime 

3 

5 

10 

14 


NWHFC 1 

3 (3) 

4 (4) 

7 (7) 

9 (9) 


NWHFC 2 

3 (3) 

4 (4) 

7 (7) 

8 (9) 


NWHFCs 

3 (3) 

4 (4) 

7 (7) 

8 (8) 


absorption bands 1—5, 105—115, 150—170 and 220—224 were removed) and a spatial res¬ 


olution of 20 m [20|, [36|. The eonsidered image subset eontains 351 x 351 pixels and 
is shown in RGB eolors in Fig. (b). This image has been widely studied and a ground 
truth information is available. Aeeording to USGSj^ this image eontains at least 18 minerals 


|37|. The eonsidered algorithms were applied to this image leading to the results in Table 


^Available; http://speclab.cr.usgs.gOv/cuprite95.tgif.2.2um_map.gif 
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TABLE III 

Median of the estimated R for data corrupted by colored noise (Gaussian shape) with 50 Monte 
Carlo simulations. For NWHFC, we show between brackets the results when using the ground-truth 

NOISE COVARIANCE MATRIX. 


SNR 

Method 

R=3 

II 

R = 10 

R = 15 


EGA 

3 

5 

6 

6 


RMT 

3 

5 

5 

5 

15 dB 

HySime 

3 

4 

5 

5 


NWHFC 1 

3 (3) 

5 (5) 

8 (7) 

9(9) 


NWHFC2 

3 (3) 

5 (5) 

8 (7) 

8 (8) 


NWHFCs 

3 (3) 

5 (4) 

7 (7) 

8 (8) 


EGA 

3 

5 

9 

10 


RMT 

3 

5 

8 

9 

25 dB 

HySime 

3 

5 

8 

8 


NWHFC 1 

3 (3) 

5 (5) 

8 (7) 

9 (10) 


NWHFC 2 

3 (3) 

5 (5) 

8 (7) 

9 (9) 


NWHFCs 

3 (3) 

5 (4) 

8 (7) 

8 (9) 


EGA 

3 

5 

10 

14 


RMT 

3 

5 

10 

13 

35 dB 

HySime 

3 

5 

10 

13 


NWHFC 1 

3 (3) 

5 (5) 

9 (7) 

11 (9) 


NWHFC 2 

3 (3) 

5 (4) 

8 (7) 

10(8) 


NWHFCs 

3 (3) 

5 (4) 

8 (7) 

9 (8) 


EGA 

3 

5 

10 

15 


RMT 

3 

5 

10 

15 

50 dB 

HySime 

3 

5 

10 

14 


NWHFC 1 

7 (3) 

9 (5) 

14 (7) 

18 (9) 


NWHFC 2 

7 (3) 

9 (5) 

12 (7) 

16 (9) 


NWHFCs 

6 (3) 

7 (5) 

11 (7) 

15 (9) 


IV (second column). All the algorithms estimated a number of endmembers larger than 18. 


EGA provided a more realistic value than RMT, which suffers from the spectral correlation 
when considering a multiple regression noise estimation algorithm p0| . However, the results 
obtained with Hysime, HFC and NWHFC were in better agreement with the ground truth 
(closer to 18 endmembers). 

The third image was also acquired by the AVIRIS sensor, in June 1992 over an agri- 
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cultural area of the northwestern Indiantj^ (Indian Pines). The eonsidered dataset eontains 
145 X 145 pixels, 185 speetral bands with the same speetral resolution and speetral range 
as the Cuprite image (the water absorption bands 1—3, 103—113, 148—166 and 221—224 
were removed) and a spatial resolution of 17 m [[^. As shown in Fig. (e), the observed 
image is a mixture of agrieulture and forestry. Aeeording to the ground truth information 
Q, [341, this image eontains at least 16 endmembers that are: alfalfa, eorn-notill, eom- 
mintill, eorn, grass-pasture, grass-trees, grass-pasture-mowed, hay-windrowed, oats, soybean- 
notill, soybean-mintill, soybean-elean, wheat, woods, buildings-grass-trees-drives and stone- 


steel-towers. Therefore, the estimated number should be greater than 16. Table IV (third 
eolumn) reports the experimental results. Exeept Hysime that under-estimated the number of 
endmembers and HFC that over-estimated it, all algorithms deteeted 18 eomponents in the 
Indian Pines image. 

The experimental results provided in this seetion illustrated the aeeuraey of the EGA when 
applied to real data, aequired by different sensors (AVIRIS and Hyspex) and eontaining 
different physieal elements (trees, grass and minerals). 


TABLE IV 

Estimated R for real images. 


Method 

Madonna 

Cuprite 

Indian Pines 

EGA 

12 

26 

18 

RMT 

17 

31 

18 

HySime 

9 

20 

14 

HFCi 

42 

22 

26 

HFC 2 

34 

21 

23 

HFCs 

31 

18 

22 

NWHFCi 

16 

22 

18 

NWHFC 2 

14 

21 

18 

NWHFCs 

14 

19 

18 


®Available: http://dynamo.ecn.purdue.edu/~biehl/MultiSpec/. 
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(c) 

Fig. 4. Real images, (a) Hyspex Madonna image, (b) AVIRIS Cuprite scene and (c) AVIRIS Indian pines 

VI. Conclusions 

This paper proposed an unsupervised algorithm for determining the number of endmembers 
in hyperspeetral images. This algorithm eonsisted of two steps that are noise estimation 
and determination of the endmember number. Noise estimation was aehieved by a multiple 
regression estimation method even if other algorithms eould be investigated. The seeond step 
was performed by thresholding the differenee between sueeessive eigenvalues of the sample 
eovarianee matrix. The resulting algorithm is non-parametrie (it does not require any user- 
determined parameter) and effieient in the presenee of i.i.d. and eolored noise. Synthetie 
experiments showed a robust behavior of EGA with respeet to noise estimation errors, noise 
eorrelations and noise levels. It also showed good performanee when eonsidering different 
image sizes. The obtained results on real images eonfirmed the aeeuraey of the proposed algo¬ 
rithm that showed eomparable or better results than some state-of-the-art algorithms. Future 
work ineludes the study of robust estimation for the pixel eovarianee matrix. Considering the 
reeent method proposed in [ |38| , for souree deteetion is also an interesting issue whieh 
would deserve to be investigated. 
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